Tutorial 1. Welcome to Holoviz (Holoviz Overview)¶

HoloViz is a set of compatible tools to make it easier to see and understand your data at every stage needed by users, research groups, and projects:

  • importing and cleaning
  • initial exploration
  • testing hypotheses
  • generating accurate and convincing figures
  • sharing and deploying live apps
  • improving and adapting each stage over time

Why "Holo"? "holo-", from the Greek root "hólos", means "whole, entire, complete".

Revealing your data (nearly) effortlessly,
at every step in your workflow


Workflow from data to decision¶


If there's no visualization at any of these stages, you're flying blind.

But visualization is often skipped as too hard to construct, particularly for big data.

What if it were simple to visualize anything, anywhere?


Doesn't Python already cover all these stages?¶

Sure! That's how it ended up with:

Good news/
Bad news


Lots of choices!
Too hard to
try them all,
learn them all, or
get them to work together.


HoloViz:



Seamless interoperability
for browser-based
viz tools

Supported by Anaconda, Inc.


Why so many tools?¶

Because each tool is typically limited to one or two of the stages in the data life cycle, supporting some well but not the others:

  • Simple, quick plots, but limited in capabilities, customization, and compositionality.
  • Deep capabilities for print-based plots, but weak or no support for custom interactivity in web apps.
  • Good interactivity in Jupyter, but weak or no support for batch processing or deployed servers.
  • Good support for deployed servers, but weak or no support for Jupyter or interactive exploration in general.
  • Good support for small datasets or heavily aggregated data, but not suitable for exploring the full raw data.

HoloViz Goals:¶

  • Full functionality in browsers (not desktop)
  • Full interactivity (inside and out of plots)
  • Focus on Python users, not web programmers
  • Start with data, not coding
  • Work with data of any size
  • Exploit general-purpose SciPy/PyData tools
  • Focus on 2D primarily, with some 3D
  • Avoid entangling your data, code, and viz:
    • Same viz/analysis code in Jupyter, Python, HPC, ...
    • Widgets/apps in Jupyter, standalone servers, web pages
    • Jupyter as a tool, not part of the results

Shortcuts, not dead ends¶





HoloViz currently covers this subset of viz tools





These tools are the most fully supported and are often entirely sufficient on their own.

HoloViz libraries¶

To address the above issues, we have developed a set of open-source Python packages to streamline the process of working with small and large datasets (from a few datapoints to billions or more) in a web browser, whether doing exploratory analysis, making simple widget-based tools, or building full-featured dashboards. The main libraries in this ecosystem include:

  • Panel: Assembling objects from many different libraries into a layout or app, whether in a Jupyter notebook or in a standalone servable dashboard
  • hvPlot: Quickly return interactive Bokeh-based HoloViews or GeoViews objects from Pandas, Xarray, orother data structures
  • HoloViews: Declarative objects for instantly visualizable data, building Bokeh plots from convenient high-level specifications
  • GeoViews: Visualizable geographic data that that can be mixed and matched with HoloViews objects
  • Datashader: Rasterizing huge datasets quickly as fixed-size images
  • Param: Declaring user-relevant parameters, making it simple to work with widgets inside and outside of a notebook context
  • Colorcet: Perceptually accurate continuous and categorical colormaps for any viz tool

Built on the Python scientific ecosystem¶

Beyond the specific HoloViz tools, all these approaches work with and often rely upon a wide range of other open-source libraries for their implementation, including:

  • Bokeh: HTML/JS plots in a web browser for Python data structures (used by Panel, hvPlot, HoloViews, GeoViews)
  • Matplotlib: Flexible, publication-quality plots (used by HoloViews, Geoviews; used with Panel)
  • Pandas: Convenient computation on columnar datasets (used by HoloViews and Datashader)
  • Xarray: Convenient computations on multidimensional array datasets (used by hvPlot HoloViews and Datashader)
  • Dask: Efficient out-of-core/distributed computation on massive datasets (used by hvPlot, Datashader)
  • Numba: Accelerated machine code for inner loops (used by Datashader)
  • Fastparquet: Efficient storage for columnar data (used with Datashader)
  • Cartopy: Support for geographical data (used by GeoViews; uses a wide range of other lower-level libraries)

The HoloViz tutorial¶

In this tutorial, we'll focus on an example set of data using it to illustrate how to:

  • create simple but powerful apps and dashboards out of anything in a Jupyter notebook
  • make simple but powerful plots out of Pandas dataframes and Xarray multidimensional arrays
  • handle columnar data, big data, geo data, array data
  • provide custom interactive links between views of datasets
  • handle the whole process from getting the data in, cleaning it, exploring it visually, creating plots for communication, building dashboards for sharing your analyses, and deploying dashboards.

The tutorial is organized around the most general to the most specific, in terms of tool support. We first look at Panel package, which works with nearly any plotting library, then hvPlot, which works with nearly any data library and shares an API with many other plotting libraries, and then dive deeper into HoloViz-specific approaches that let you work with large data, provide deep interactivity, and other advanced features.

To summarize¶

  • HoloViz provides a set of very high-level tools for interacting with data
  • These tools make it simple to work with multidimensional data, flexibly selecting, visualizing, combining, and comparing it.
  • The tools focus on information visualization in 2D in web browsers, not 3D scientific visualization.
  • Together the tools support a flexible workflow with very little friction between initial exploratory analysis, making interactive apps, building fully deployable dashboards, and revisiting the initial analyses as needed, with changes immediately propagating to the deployed dashboard.
  • The tools are designed around "shortcuts", not "dead ends", and so there is always another level deeper that you can go if you need more power or more customization.

Getting started¶

Before going further, it's worth exploring some examples of what you can get with HoloViz, to make sure that it covers your needs:

  • https://panel.pyviz.org/gallery
  • https://examples.pyviz.org

And then you can browse through the already-run versions of the HoloViz tutorials to see what they cover and how it all fits together. But everything on this website is a Jupyter Notebook that you can run yourself, once you follow the installation instructions, so the next step is then to try it all out and have fun exploring it!

Tutorial 2. Exploring Pandas Dataframes¶

If your data is in a Pandas dataframe, it's natural to explore it using the .plot() method (based on Matplotlib). Let's look at a dataset of the number of cases of measles and pertussis (per 100,000 people) over time in each state:

In [1]:
import pandas as pd

df = pd.read_csv('data/diseases.csv.gz')
df[1000:1005]
Out[1]:
Year Week State measles pertussis
1000 1947 13 Alabama 4.93 2.24
1001 1947 14 Alabama 9.96 3.50
1002 1947 15 Alabama 6.39 1.29
1003 1947 16 Alabama 12.03 2.86
1004 1947 17 Alabama 10.37 4.08

Just calling .plot() won't give anything meaningful, because it doesn't know what should be plotted against what:

In [2]:
%matplotlib inline

df.plot();

But with some Pandas operations we can pull out parts of the data that make sense to plot:

In [3]:
import numpy as np

by_year = df[["Year","measles"]].groupby("Year").aggregate(np.sum)
by_year.plot();

Here it is easy to see that the 1963 introduction of a measles vaccine brought the cases down to negligible levels.

Exploring Data with hvPlot and Bokeh¶

The above plots are just static images, but if you import the hvplot package, you can use the same plotting API to get fully interactive plots with hover, pan, and zoom in a web browser:

In [4]:
import hvplot.pandas # noqa: adds hvplot method to pandas objects

by_year.hvplot()
Out[4]:

Here the interactive features are provided by the Bokeh JavaScript-based plotting library. But what's actually returned by this call is something called a HoloViews object, here specifically a HoloViews Curve. HoloViews objects display as a Bokeh plot, but they are actually much richer objects that make it easy to capture your understanding as you explore the data:

In [6]:
import holoviews as hv
vline = hv.VLine(1963).opts(color='black')

m = by_year.hvplot() * vline * \
    hv.Text(1963, 27000, "  Vaccine introduced", halign='left')
m
Out[6]:

While still always being able to access the original data involved for further analysis:

In [7]:
print(m)
m.Curve.I.data.head()
:Overlay
   .Curve.I :Curve   [Year]   (measles)
   .VLine.I :VLine   [x,y]
   .Text.I  :Text   [x,y]
Out[7]:
Year measles
0 1928 16924.34
1 1929 12060.96
2 1930 14575.11
3 1931 15427.67
4 1932 14481.11

For other plotting libraries, a given visualization that you construct is a dead end -- if you want to change it in some way, you'll need to reconstruct it from scratch with different settings.

Because HoloViews objects preserve your original data, you can now do more with your data than you could before, including anything you could do with the raw data, plus overlaying (as above), laying out in subfigures, slicing, sampling, setting options, and many other operations.

For instance, with HoloViews it's simple to break down the data in different ways. You can inspect each state individually:

In [8]:
measles_agg = df.groupby(['Year', 'State'])['measles'].sum()
by_state = measles_agg.hvplot('Year', groupby='State', width=500, dynamic=False)

by_state * vline
Out[8]:

Or pull out a couple of those to put side by side:

In [9]:
by_state["Texas"].relabel('Texas') * vline + by_state["New York"].relabel('New York') * vline
Out[9]:

Or to compare four states over time by overlaying:

In [10]:
states = ['New York', 'New Jersey', 'California', 'Texas']
measles_agg.loc[1930:2005, states].hvplot(by='State') * vline
Out[10]:

Or by faceting:

In [11]:
measles_agg.loc[1930:2005, states].hvplot('Year', col='State', width=400, height=200, rot=90) * vline
Out[11]:

Or as a different type of plot, such as a bar chart:

In [12]:
measles_agg.loc[1980:1990, states].hvplot.bar('Year', by='State', rot=90)
Out[12]:

Or with additional information, such as error bars:

In [13]:
df_error = df.groupby('Year').agg({'measles': [np.mean, np.std]}).xs('measles', axis=1)
df_error.hvplot(y='mean') * hv.ErrorBars(df_error, 'Year').redim.range(mean=(0, None)) * vline
Out[13]:

If we really want to invest a lot of time in making a fancy plot, we can customize it to try to show all the yearly data about measles at once:

In [15]:
heatmap = df.hvplot.heatmap('Year', 'State', 'measles', reduce_function=nansum,
    logz=True, height=500, width=900, xaxis=None, flip_yaxis=True, clim=(1, np.nan))

aggregate = hv.Dataset(heatmap).aggregate('Year', np.mean, np.std)
agg = hv.ErrorBars(aggregate) * hv.Curve(aggregate).opts(xrotation=90)
agg = agg.options(height=200, show_title=False)

marker = hv.Text(1963, 800, u'\u2193 Vaccine introduced', halign='left')
In [16]:
(heatmap + (agg * marker).opts(width=900)).cols(1)
Out[16]:

If you prefer, you can choose matplotlib to render your HoloViews plots, though you give up the interactive pan, zoom, and hover from Bokeh:

In [17]:
mpl = by_state * hv.VLine(1963).opts(color="black") * \
      hv.Text(1963, 1000, "  Vaccine introduced", halign='left')
hv.output(mpl, backend='matplotlib')

As you can see, these tools make it very quick to explore your data in a browser, and if you choose HoloViews+Bokeh plots, you can have full interactivity with very little code even for quite complex datasets.

Interactive statistical plots¶

For high-dimensional datasets with additional data variables, we can compose all the above faceting methods as needed.

For instance, let's look at the Iris dataset:

In [18]:
from bokeh.sampledata.iris import flowers as iris

iris.tail()
Out[18]:
sepal_length sepal_width petal_length petal_width species
145 6.7 3.0 5.2 2.3 virginica
146 6.3 2.5 5.0 1.9 virginica
147 6.5 3.0 5.2 2.0 virginica
148 6.2 3.4 5.4 2.3 virginica
149 5.9 3.0 5.1 1.8 virginica

We can now look at all these relationships at once, interactively:

In [19]:
hvplot.scatter_matrix(iris, c='species')
Out[19]:

Tutorial 3. Geospatial data and custom controls¶

HoloViz is a modular suite of tools, and when you need capabilities not handled by Bokeh and HoloViews (and optionally hvPlot) as above, you can bring those in:

  • GeoViews: Visualizable geographic HoloViews objects
  • Datashader: Rasterizing huge HoloViews objects to images quickly
  • Param: Declarative parameters
  • Panel: Making it simple to work with widgets inside and outside of a notebook context
  • Colorcet: perceptually uniform colormaps for big data

We'll look at a dataset of earthquakes on a map.

In [20]:
import dask.dataframe as dd
import datashader as ds
from colorcet import palette
from holoviews.element.tiles import EsriImagery

topts = hv.opts.Tiles(width=700, height=600, bgcolor='black', 
                      xaxis=None, yaxis=None, show_grid=False)
tiles = EsriImagery().opts(topts) 
earthquakes  = dd.read_parquet('data/earthquakes.parq', engine='fastparquet').persist()
colormaps = {n: palette[n] for n in ['fire','bgy','bgyw','bmy','gray','kbc']}

x, y = ds.utils.lnglat_to_meters(earthquakes.longitude, earthquakes.latitude)
projected_earthquakes = earthquakes.assign(x=x, y=y).persist()
In [21]:
import hvplot.dask # noqa: adds hvplot method to dask objects

def view(cmap=colormaps['fire'], alpha=1, reverse_colormap=False):
    cmap = cmap if not reverse_colormap else cmap[::-1]
    return tiles.opts(alpha=alpha) * projected_earthquakes.hvplot.points(
        'x', 'y', datashade=True, cmap=cmap
    )

view()
Out[21]:

As you can see, you can specify geo plots easily and if your HoloViews objects are too big to visualize in a browser directly, you can add datashade() to render them into images dynamically on zooming, etc.

NOTE: HoloViews includes support for basic web-based map tiles as used here, but if you need to work flexibly with different geographic projections, you'll want to install GeoViews as well. See the notebook on Geographic Data for more information.

You can also easily add widgets to control filtering, selection, and other options interactively, either here in the notebook or by putting the same code in a separate file and running it as a standalone server:

In [22]:
import panel as pn
explorer = pn.interact(view, cmap=colormaps, alpha=(0, 1.), reverse_colormap=False)

pn.Row(pn.Column('# Earthquake Explorer', explorer[0]), explorer[1]).servable()
Out[22]:

Here we used the Panel interact function to create a simple app based on the view function, and then we mixed and matched some of its components to lay it out in rows and columns as you see above.

In this simple app, the view function is called whenever any of the parameters change (alpha, colormap, or location), triggering a full rerender, but you can get a more responsive interface if you take the time to declare which computations depend on which parameters (see the Deploying Bokeh Apps tutorial).

Either way, the app should work the same here in the notebook (if you have a live Python process) or as a standalone server by calling panel serve with either the name of a Python file with the above code or simply the name of this notebook (where it will run the notebook code and serve any objects marked .servable()).)

As you can see, the HoloViz tools let you integrate visualization into everything you do, using a small amount of code that reveals your data's properties and captures your understanding of it. The rest of these tutorials will break down each of the topics covered above, showing you step by step how to work with your own data using these tools.

Thanks to all of the HoloViz contributors!

Tutorial 4. Building Panels¶

Panel is designed to make it simple to add interactive controls to your existing plots and data displays, simple to build apps for your own use in a notebook, simple to deploy apps as standalone dashboards to share with colleagues, and seamlessly shift back and forth between each of these tasks as your needs evolve. If there is one thing you should take away from this tutorial, it's Panel!

Throughout this tutorial we will use a wave heights dataset collected by NOAA, so will start by loading it:

In [23]:
from load_data import *

df = load_data()
print(df.shape)
df.head()
(230988, 7)
Out[23]:
station latitude longitude time wvht wspd gst
0 41001 34.675 -72.698 2021-01-01T00:40:00Z 2.14 10.0 12.8
1 41001 34.675 -72.698 2021-01-01T01:40:00Z 2.23 10.6 12.9
2 41001 34.675 -72.698 2021-01-01T02:40:00Z 2.07 10.6 13.3
3 41001 34.675 -72.698 2021-01-01T03:40:00Z 1.97 9.2 11.6
4 41001 34.675 -72.698 2021-01-01T04:40:00Z 1.94 9.2 11.3

Throughout this tutorial we will use a wave heights dataset collected by NOAA, so will start by loading it:

import panel as pn

pn.extension()

pn.interact¶

Before we get into the details of how Panel allows you to render and lay out objects we will dive straight in and use Panel's interact function, modeled on the similar function in ipywidgets, to get a simple interactive app immediately. For instance, if you have a function that returns a row of a dataframe given an index, you can very easily make a panel with a widget to control the row displayed.

In [24]:
def select_row(row=0):
    row = df.loc[row].to_frame()
    return row.style.format({"time": lambda t: t.strftime("%c")})

pn.interact(select_row, row=(0, len(df)-1))
Out[24]:

This approach can be used for any function that returns a displayable object, calling the function whenever one of the parameters of that function has changed.

In the spirit of "shortcuts, not dead ends", let's see what's in the object returned by interact:

In [25]:
app = pn.interact(select_row, row=(0, len(df)-1))

print(app)
Column
    [0] Column
        [0] IntSlider(end=230987, name='row')
    [1] Row
        [0] HTML(Styler, name='interactive251017')

interact¶

interact has constructed a Column panel consisting of one Column of widgets (with one widget), and one Row of output (with one HTML pane). This object, once created, is a full compositional Panel object, and can be reconfigured and expanded with additional content if you wish, without breaking the connections between widgets and values:

In [26]:
pn.Column("## Choose a row", pn.Row(app[0], app[1]))
Out[26]:

Hopefully from this simple example you can see the sorts of things Panel can do. In the rest of this section we'll cover some of the items you can use in a panel and how to compose them. In the subsequent section we will dive into how to set up widgets and their relationships explicitly, and then build a custom dashboard as an exercise. For now, we won't show code for any particular plotting library, but if you have a favorite one already, you should be able to use it with Panel in the exercises.

Component types¶

Before we start building more interactive apps, we will learn about the three main types of components in Panel:

  • Pane: A Pane provides a view of an external object (text, image, plot, etc.) by wrapping it
  • Panel: A Panel lays out multiple components in a row, column, or grid.
  • Widget: A Widget provides input controls to add interactive features to your Panel.

If you ever want to discover how a particular component works, see the reference gallery.

Displaying content¶

The fundamental concept behind Panel is that it transforms the objects you give it into a viewable object that can be composed into a layout and updated dynamically. In this tutorial we will be building a dashboard visualizing a dataset of earthquake events, so let us start by displaying a title using the pn.panel function:

In [27]:
title = pn.panel('## Major Waves Dashboard')

title
Out[27]:
In [28]:
df
Out[28]:
station latitude longitude time wvht wspd gst
0 41001 34.675 -72.698 2021-01-01T00:40:00Z 2.14 10.0 12.8
1 41001 34.675 -72.698 2021-01-01T01:40:00Z 2.23 10.6 12.9
2 41001 34.675 -72.698 2021-01-01T02:40:00Z 2.07 10.6 13.3
3 41001 34.675 -72.698 2021-01-01T03:40:00Z 1.97 9.2 11.6
4 41001 34.675 -72.698 2021-01-01T04:40:00Z 1.94 9.2 11.3
... ... ... ... ... ... ... ...
230983 VBBA3 36.132 -114.412 2021-02-25T05:20:00Z 1.03 13.1 16.2
230984 VBBA3 36.132 -114.412 2021-02-25T05:30:00Z 1.03 13.9 17.5
230985 VBBA3 36.132 -114.412 2021-02-25T05:40:00Z 1.03 13.1 16.9
230986 VBBA3 36.132 -114.412 2021-02-25T05:50:00Z 1.03 13.8 17.0
230987 VBBA3 36.132 -114.412 2021-02-25T06:00:00Z 1.03 13.3 15.5

230988 rows × 7 columns

To understand how Panel rendered this string we can take a look at the textual representation of this object:

In [29]:
# top 5 waves in July 2021
df_sorted = df.sort_values(by=['wvht'], ascending=False)
df_sorted.head()
Out[29]:
station latitude longitude time wvht wspd gst
107281 46071 51.155 179.001 2021-01-01T00:50:00Z 17.68 NaN NaN
107283 46071 51.155 179.001 2021-01-01T02:50:00Z 17.55 NaN NaN
107284 46071 51.155 179.001 2021-01-01T03:50:00Z 17.30 NaN NaN
107286 46071 51.155 179.001 2021-01-01T05:50:00Z 16.12 NaN NaN
107285 46071 51.155 179.001 2021-01-01T04:50:00Z 15.94 NaN NaN

Panel transformed the str object and wrapped it in a so-called Markdown Pane. The pn.panel function attempts to find the most appropriate representation for different objects whether it is a string, an image, or even a plot. So if we provide the location of a PNG file instead as a path or a URL, the panel function will automatically infer that it should be rendered as an image:

In [30]:
noaa_logo = pn.panel('assets/noaa-lrg.png', height=130)
noaa_logo
Out[30]:

The appropriate representation is resolved using a set of precedences, so it may sometimes be necessary to explicitly declare the type of Pane that is required. For example, if we want to display some HTML, which cannot easily be distinguished from Markdown, we can explicitly declare it by specifying the HTML Pane type from the pn.pane module:

In [31]:
pn.pane.HTML('<marquee width=500><b>Breaking news</b>: Major waves off coast of Rat Islands</marquee>')
Out[31]:

Laying out content¶

In addition to Pane objects, Panel provides Panel objects that allow laying out components. The principal layouts are by Row or Column. These components act just like a regular list in Python:

In [32]:
column = pn.Column(title, noaa_logo, app)

column
Out[32]:

Panels may be nested arbitrarily to construct complex layouts. Internally, Panel will call the pn.panel function on any objects which are not already a known component type, making it easy to lay out objects without explicitly wrapping them in a panel component, though wrapping it explicitly can help ensure that it is the type you expect:

In [33]:
import pandas as pd

df_top5 = pd.DataFrame(df_sorted[0:10], columns=['station', 'time', 'wvht'])

row = pn.Row(column,
    pn.Column('### Top 5', pn.panel(df_top5, width=500)))

row
Out[33]:

In the previous section we learned the very basics of working with Panel. Specifically we looked at the different types of components, how to update them and how to serve a Panel application or dashboard. However to start building actual apps with Panel we need to be able to add interactivity by linking different components together. In this section we will learn how to link widgets to outputs to start building some simple interactive applications.

In this section we will once again make use of the wave heights dataset we loaded previously and compute some statistics.

Widgets and reactive components¶

pn.interact constructs widgets automatically that can then be reconfigured, but if you want more control, you'll want to instantiate widgets explicitly. A widget is an input control that allows a user to change a value using some graphical UI. A simple example is a RangeSlider:

In [34]:
wvht_filter = pn.widgets.RangeSlider(name='Wave Heights', start=0, end=df['wvht'].max())

wvht_filter
Out[34]:

The widget value is a Parameter that is set to a tuple of the selected upper and lower bound. Parameters are an extended type of Python attribute that declare their type, range, etc. so that other code can interact with them in a consistent way. When we change the range using the widget the value parameter updates, and vice versa if you change the value parameter manually:

In [35]:
wvht_filter.value
Out[35]:
(0, 17.68)

Callbacks¶

The depends API is still a very high level way of declaring interactive components. Panel also supports the more low-level approach of writing callbacks in response to changes in some parameter, e.g. the value of a widget. All parameters can be watched using the .param.watch API, which will call the provided callback with an event object containing the old and new value of the widget.

Now that it is loaded we will create a slider which we will eventually use to select the row of the dataframe that we want to display.

In [36]:
row_slider = pn.widgets.IntSlider(value=0, start=0, end=len(df)-1)

Next we create a Pane to display the current row of the dataframe with times formatted nicely:

In [37]:
row_pane = pn.panel(df.loc[row_slider.value])
row_pane
Out[37]:

Now that we have defined both the widget and the object we want to update we can declare a callback to link the two. As we learned in the previous section assigning a new value to the object of a pane will update the display. In the callback we select the row of the dataframe and then assign it to the pane.object.

In [38]:
def df_callback(event):
    row_pane.object = df.loc[event.new]

Lastly we actually have to register this callback. To do so we provide the callback and the parameter we want to trigger the event on the slider's .param.watch method:

In [39]:
row_slider.param.watch(df_callback, 'value')
Out[39]:
Watcher(inst=IntSlider(end=230987), cls=<class 'panel.widgets.slider.IntSlider'>, fn=<function df_callback at 0x18def40d0>, mode='args', onlychanged=True, parameter_names=('value',), what='value', queued=False, precedence=0)

Now that everything is connected up we can put both the widget and the pane in a panel and display them:

In [42]:
pn.Column(row_slider, row_pane, width=400)
Out[42]:

As you can see, this process is slightly more laborious than pn.interact or even the pn.depends approach, but doing it in this way should help you see how everything fits together and can be useful to more precisely control callbacks that update particular parameters or the contents of a larger layout.

Tutorial 5. Revisiting plotting with geospatial data¶

If you have tried to visualize a pandas.DataFrame before, then you have likely encountered the Pandas .plot() API. This basic plotting interface uses Matplotlib to render static PNGs or SVGs in a Jupyter notebook using theinline backend (or interactive figures via %matplotlib notebook or %matplotlib widget) and for exporting from Python, with a command that can be as simple as df.plot() for a DataFrame with one or two columns.

The Pandas .plot() API has emerged as a de-facto standard for high-level plotting APIs in Python, and is now supported by many different libraries that use other underlying plotting engines to provide additional power and flexibility. Thus learning this API allows you to access capabilities provided by a wide variety of underlying tools, with relatively little additional effort. The libraries currently supporting this API include:

  • Pandas -- Matplotlib-based API included with Pandas. Static or interactive output in Jupyter notebooks.
  • xarray -- Matplotlib-based API included with xarray, based on pandas .plot API. Static or interactive output in Jupyter notebooks.
  • hvPlot -- HoloViews and Bokeh-based interactive plots for Pandas, GeoPandas, xarray, Dask, Intake, and Streamz data.
  • Pandas Bokeh -- Bokeh-based interactive plots, for Pandas, GeoPandas, and PySpark data.
  • Cufflinks -- Plotly-based interactive plots for Pandas data.
  • Plotly Express -- Plotly-Express-based interactive plots for Pandas data; only partial support for the .plot API keywords
  • PdVega -- Vega-lite-based, JSON-encoded interactive plots for Pandas data.

In this notebook we'll explore what is possible with the default .plot API and demonstrate the additional capabilities of .hvplot, using the same dataset. Of course, this particular dataset is just an example; the same approach can be used with just about any tabular dataset.

In [43]:
lat_min = df.latitude.min()
lat_max = df.latitude.max()

lon_min = df.longitude.min()
lon_max = df.longitude.max()

f"Wave heights from {lat_min, lat_max} latitude to {lon_min, lon_max} longitude"
Out[43]:
'Wave heights from (-14.265, 60.794) latitude to (-172.167, 179.001) longitude'

Using Pandas .plot¶

The first thing that we'd like to do with this data is visualize the locations of every earthquake. So we would like to make a scatter or points plot where x='longitude' and y='latitude'.

If you are familiar with the pandas.plot API, you might expect to execute df.plot.scatter(x='longitude', y='latitude'). Feel free to try this out in a new cell, but it will throw an error: AttributeError: 'DataFrame' object has no attribute 'plot'. In order to make the data more manageable for now, we'll briefly use just a fraction (1%) of it and call that small_df.

In [44]:
%matplotlib inline
small_df = df.sample(frac=.1)
small_df.shape
Out[44]:
(23099, 7)

Now we have a smaller dataset with many fewer observations. We can use that to test out our visualizations before ramping back up to the full dataset.

In [45]:
small_df.plot.scatter(x='longitude', y='latitude');

Using .hvplot¶

As you can see above, the Pandas API gives you a usable plot very easily, where you can start to see the density of waves in the western hemisphere. You can make a very similar plot with the same arguments using hvplot.

In [46]:
import hvplot.pandas
small_df.hvplot.scatter(x='longitude', y='latitude', alpha=0.1)
Out[46]:

Here unlike in the Pandas .plot() there is a default hover action on the datapoints to show the location values, and you can also pan and zoom to focus on any particular region of the data of interest.

You might have noticed that many of the dots in the scatter that we've just created lie on top of one another. This is called "overplotting" and can be avoided in a variety of ways, such as by making the dots slightly transparent, or binning the data. These approaches have the downside of introducing bias because you need to choose the alpha or the edges of the bins, and in order to do that, you have to make assumptions about the data. For an initial exploration of a new dataset, it's much safer if you can just see the data, before you impose any assumptions about its form or structure.

Datashader¶

To avoid some of the problems of traditional scatter/point plots we can use Datashader, which aggregates data into each pixel without any arbitrary parameter settings. In hvplot we can activate this capability by setting datashade=True.

In [47]:
small_df.hvplot.scatter(x='longitude', y='latitude', datashade=True, dynspread=True)
Out[47]:

Now you can see all of the rich detail in this set of thousands of wave heights. If you have a live Python process running, you can zoom in and see additional detail at each zoom level, without tuning any parameters or making any assumptions about the form or structure of the data. We'll come back to Datashader later, but for now the important thing to know about it is that it lets us work with arbitrarily large datasets in a web browser conveniently.

Note that the .hvplot() API works here because unlike the other .plot libraries, hvplot doesn't just target Pandas objects. Instead hvplot can be used with:

  • Pandas : DataFrame, Series (columnar/tabular data)
  • xarray : Dataset, DataArray (labelled multidimensional arrays)
  • Dask : DataFrame, Series (distributed/out of core arrays and columnar data)
  • Streamz : DataFrame(s), Series(s) (streaming columnar data)
  • Intake : DataSource (data catalogues)
  • GeoPandas : GeoDataFrame (geometry data)
  • NetworkX : Graph (network graphs)

Statistical Plots¶

Let's dive into some of the other capabilities of .plot() and .hvplot(), starting with the frequency of different wind gusts.

As a first pass, we'll use a histogram first with plot.hist on the small data, then with .hvplot.hist on the full dataset.

In [48]:
small_df.plot.hist(y='gst');
In [49]:
df.hvplot.hist(y='gst', bin_range=(0, 10), bins=50)
Out[49]:

Adding a third dimension¶

Now let's filter the waves to only include the really gusty ones. We can add extra dimensions to the visualization by using color in addition to x and y.

In [50]:
import hvplot.pandas
most_severe = df[df.gst >= 10]
%matplotlib inline
most_severe.plot.scatter(x='longitude', y='latitude', c='gst')
Out[50]:
<AxesSubplot:xlabel='longitude', ylabel='latitude'>

Here is the analogous version using hvplot where we grab the handle high_wspd_scatter so we can inspect the return value:

In [51]:
high_wspd_scatter = most_severe.hvplot.scatter(x='longitude', y='latitude', c='gst')
high_wspd_scatter
Out[51]:

Note: The notion of a 'scatter' plot implies that there is an independent variable and at least one dependent variable. This is reflected in the printed representation where the independent variables are in the square brackets and the dependent ones are in parentheses - we can now see that this scatter object implies that latitude is dependent on longitude, which is incorrect. We'll fix the dimensions later.

First, let's adjust the options to create a better plot. First we'll use colorcet to get a colormap that doesn't have white at one end, to avoid ambiguity with the page background. We can choose one from the website and use the HoloViews/Bokeh-based colorcet plotting module to make sure it looks good.

In [52]:
import colorcet as cc
from colorcet.plotting import swatch
swatch('CET_L4')
Out[52]:

We'll reverse the colors to align dark reds with gustier waves.

In [53]:
wspd_cmap = cc.CET_L4[::-1]

In addition to fixing the colormap, we will now switch from scatter to using points to correctly reflect that longitude and latitude are independent variables, as well as add some additional columns to the hover text, and add a title.

In [54]:
gusty_points = most_severe.hvplot.points(
    x='longitude', y='latitude', c='gst', hover_cols=['place', 'time'],
    cmap=wspd_cmap,  title='Wave Heights with gusts >= 10'
)

gusty_points
Out[54]:

When you hover over the points you'll see the place and time of the waves in addition to the wind speed and lat/lon.

Overlay with a tiled map¶

That colormap is better, and we can kind of see the outlines of the continents, but the visualization would be much easier to parse if we added a base map underneath. To do this, we'll import a tile element from HoloViews, namely the OSM tile from openstreetmap using the Web Mercator projection:

In [55]:
from holoviews.element.tiles import OSM
OSM()
Out[55]:

Note that when you zoom the map becomes more and more detailed, downloading tiles as necessary. In order to overlay on this basemap, we need to project our waves to the Web Mercator projection system.

To do this we will use the lnglat_to_meters function in the datashader.geo module to map longitude and latitude to easting and northing respectively:

In [56]:
import numpy as np
import pandas as pd

from datashader.utils import lnglat_to_meters
x, y = lnglat_to_meters(most_severe.longitude, most_severe.latitude)
most_severe_projected = most_severe.join([pd.DataFrame({'easting': x}), pd.DataFrame({'northing': y})])

We can now overlay our points on top of the OSM tile source but instead of overlaying the tile source explicitly we can also just specify tiles='OSM' as a string:

In [57]:
most_severe_projected.hvplot.points(
    x='easting', y='northing', c='wspd', hover_cols=['place', 'time'], 
    cmap=wspd_cmap, title='Waves with gusts >= 10', tiles='OSM',
    line_color='black'
)
Out[57]:

Note that the Web Mercator projection is only one of many possible projections used when working with geospatial data. If you need to work with these different projections, you can use the GeoViews extension to HoloViews that makes elements aware of the projection they are defined in and automatically projects into whatever coordinates are needed for display.

Thank you! More links¶

You will find extensive support material on the websites for each package. You may find these links particularly useful during the tutorial:

  • HoloViz Tutorial
  • hvPlot user guide: Guide to the plots available via .hvplot()
  • HoloViews reference gallery: Visual reference of all HoloViews elements and containers, along with some other components
  • Panel reference gallery: Visual reference of all panes, layouts and widgets.